智能论文笔记

Exploring Autoencoder-based Error-bounded Compression for Scientific Data

Jinyang Liu , Sheng Di , Kai Zhao , Sian Jin , Dingwen Tao , Xin Liang , Zizhong Chen , Franck Cappello

分类：机器学习 | 人工智能

2021-05-25

遇到错误的损耗压缩正成为必不可少的技术，即当今科学项目的成功，并在模拟或仪器数据获取过程中产生了大量数据。它不仅可以显着减少数据大小，而且还可以基于用户指定的错误界限控制压缩错误。自动编码器（AE）模型已被广泛用于图像压缩中，但是很少有基于AE的压缩方法支持遇到错误的功能，这是科学应用所要求的。为了解决这个问题，我们使用卷积自动编码器探索以改善科学数据的错误损失压缩，并提供以下三个关键贡献。（1）我们对各种自动编码器模型的特性进行了深入的研究，并根据SZ模型开发了基于错误的自动编码器的框架。（2）我们在设计的基于AE的错误压缩框架中优化了主要阶段的压缩质量，并微调块大小和潜在尺寸，并优化了潜在向量的压缩效率。（3）我们使用五个现实世界的科学数据集评估了我们提出的解决方案，并将其与其他六项相关作品进行了比较。实验表明，我们的解决方案在测试中的所有压缩机中表现出非常具有竞争性的压缩质量。从绝对的角度来看，与SZ2.1和ZFP相比，在高压比的情况下，它可以获得更好的压缩质量（压缩率和相同数据失真的100％〜800％提高）。

translated by 谷歌翻译

Task-Guided IRL in POMDPs that Scales

Franck Djeumou , Christian Ellis , Murat Cubuktepe , Craig Lennon , Ufuk Topcu

分类：机器学习 | 人工智能

2022-12-30

In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.

translated by 谷歌翻译

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Amir Pouran Ben Veyseh , Minh Van Nguyen , Franck Dernoncourt , Thien Huu Nguyen

分类：自然语言处理

2022-11-11

Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.

translated by 谷歌翻译

MEE: A Novel Multilingual Event Extraction Dataset

Amir Pouran Ben Veyseh , Javid Ebrahimi , Franck Dernoncourt , Thien Huu Nguyen

分类：自然语言处理

2022-11-11

Event Extraction (EE) is one of the fundamental tasks in Information Extraction (IE) that aims to recognize event mentions and their arguments (i.e., participants) from text. Due to its importance, extensive methods and resources have been developed for Event Extraction. However, one limitation of current research for EE involves the under-exploration for non-English languages in which the lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. To address this limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that provides annotation for more than 50K event mentions in 8 typologically different languages. MEE comprehensively annotates data for entity mentions, event triggers and event arguments. We conduct extensive experiments on the proposed dataset to reveal challenges and opportunities for multilingual EE.

translated by 谷歌翻译

User-Entity Differential Privacy in Learning Natural Language Models

Phung Lai , NhatHai Phan , Tong Sun , Rajiv Jain , Franck Dernoncourt , Jiuxiang Gu , Nikolaos Barmpalios

分类：自然语言处理 | 机器学习

2022-11-01

In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.

translated by 谷歌翻译

Tutorial Recommendation for Livestream Videos using Discourse-Level Consistency and Ontology-Based Filtering

Amir Pouran Ben Veyseh , Franck Dernoncourt , Thien Huu Nguyen

分类：自然语言处理

2022-09-11

流视频是创作者与观众分享创意作品的方法之一。在这些视频中，流媒体分享了如何通过在一个或几个用于创意项目的程序中使用各种工具来实现最终目标。为此，可以讨论实现最终目标所需的步骤。因此，这些视频可以提供大量的教育内容，这些内容可用于学习如何使用流媒体使用的工具。但是，缺点之一是，流媒体可能无法为每个步骤提供足够的详细信息。因此，对于学习者来说，可能很难赶上所有步骤。为了减轻此问题，一种解决方案是将流视频与流视频中使用的工具可用的相关教程联系起来。更具体地说，系统可以分析实时流媒体视频的内容，并推荐最相关的教程。由于现有的文档推荐模型无法处理这种情况，因此在这项工作中，我们为实时流程视频的教程建议提供了一个新颖的数据集和模型。我们对拟议的数据集和模型进行了广泛的分析，揭示了该任务的挑战性质。

translated by 谷歌翻译

Improving Keyphrase Extraction with Data Augmentation and Information Filtering

Amir Pouran Ben Veyseh , Nicole Meister , Franck Dernoncourt , Thien Huu Nguyen

分类：自然语言处理

2022-09-11

键形提取是NLP中文档理解的重要任务之一。虽然大多数先前的作品都致力于正式设置，例如书籍，新闻或网络博客，但探索视频成绩单等非正式文本的探索较少。为了解决这一局限性，在这项工作中，我们提出了一种新颖的语料库和方法，用于从Behance平台上流的视频的成绩单中提取钥匙短语。更具体地说，在这项工作中，提出了一种新型的数据增强，以通过从其他域中提取键形提取任务的背景知识来丰富模型。提出的数据集数据集上的广泛实验显示了引入方法的有效性。

translated by 谷歌翻译

Model Transparency and Interpretability : Survey and Application to the Insurance Industry

Dimitri Delcaillau , Antoine Ly , Alize Papp , Franck Vermet

分类： (统计)机器学习 | 机器学习

2022-09-01

即使有效，模型的使用也必须伴随着转换数据的各个级别的理解（上游和下游）。因此，需求增加以定义单个数据与算法可以根据其分析可以做出的选择（例如，一种产品或一种促销报价的建议，或代表风险的保险费率）。模型用户必须确保模型不会区分，并且也可以解释其结果。本文介绍了模型解释的重要性，并解决了模型透明度的概念。在保险环境中，它专门说明了如何使用某些工具来强制执行当今可以利用机器学习的精算模型的控制。在一个简单的汽车保险中损失频率估计的示例中，我们展示了一些解释性方法的兴趣，以适应目标受众的解释。

translated by 谷歌翻译

HTML版本

Neural Mesh-Based Graphics

Shubhendu Jena , Franck Multon , Adnane Boukhayma

分类：计算机视觉

2022-08-10

我们重新审视NPBG，这是一种流行的新型视图合成方法，引入了无处不在的点神经渲染范式。我们对具有快速视图合成的数据效率学习特别感兴趣。除前景/背景场景渲染分裂以及改善的损失外，我们还通过基于视图的网状点描述符栅格化来实现这一目标。通过仅在一个场景上训练，我们的表现就超过了在扫描仪上接受过培训的NPBG，然后进行了填充场景。我们还针对最先进的方法SVS进行了竞争性，该方法已在完整的数据集（DTU，坦克和寺庙）上进行了培训，然后进行了对现场的培训，尽管它们具有更深的神经渲染器。

translated by 谷歌翻译

Learning Generalizable Light Field Networks from Few Images

Qian Li , Franck Multon , Adnane Boukhayma

分类：计算机视觉 | 人工智能

2022-07-24

我们探索了基于神经光场表示的几种新颖观点合成的新策略。给定目标摄像头姿势，隐式神经网络将每个射线映射到其目标像素的颜色。该网络的条件是根据来自显式3D特征量的粗量渲染产生的本地射线特征。该卷是由使用3D Convnet的输入图像构建的。我们的方法在基于最先进的神经辐射场竞争方面，在合成和真实MVS数据上实现了竞争性能，同时提供了100倍的渲染速度。

translated by 谷歌翻译